Individual Poster Page

See copyright notice at the bottom of this page.

List of All Posters

 


How are Runs Really Created

August 12, 2002 - Rob Wood

So you are saying that the "value" of any event depends on the underlying run environment. But it also surely depends upon the specific base-out situation, the identity of the batter, who is on deck, etc. Simulations over zillions of such situations are used to estimate the "average" value. Accordingly, this approach necessarily smears a whole bunch of disparate situations into one number. For example, a single being worth 0.46 runs.

I wonder how much variability there is in these event values over the different possible situations. Base-out should be fairly easy to do, and you may have already done the analysis.

It may also be interesting to look at where in the lineup the batter hits (for example, if the cleanup hitter is up next or the woeful hitting pitcher). Maybe you could break the lineup up into four groups. 1-2 hitters, 3-4-5 hitters, 6-7-8-9 (for the AL), and 1-2, 3-4-5, 6-7-8, and 9 (for the NL).

The basic question is how good of an estimate of the event's run value is any specific estimate. If the true run value varies widely from situation to situation, then we would place less credibility in any one formula (there would be a fair amount of variance around the estimate). I realize that the runs created formulas do a good job of predicting actual runs scored, so something must be at work to help. But it may be that frequencies are sufficiently rich for events to "even out" over the course of a season and over an entire team. If that were the case, the variance around any one player's runs created may still be rather high.

I further realize that the main goal of this approach is to derive the single best linear-weights formula (using only seasonal situation-independent stats) to estimate each player's offensive contribution. But it appears that you have all the ingredients to investigate whether this approach could be significantly improved by considering other detailed information as well. Recall that Bill James incorporates a hitter's batting average with runners in scoring position and his home runs with runners on base along with his runs created in James' win share system.

Thanks much.


How are Runs Really Created

August 13, 2002 - Rob Wood

Thanks Tango for the link to the base-out run values. I notice that there is a fair amount of variability in the values across the different base-out situations, as expected.

That brings me back to an issue I raised in my previous post. Have you (or can you) used your simulator to estimate the variability around a player's runs expectancy figure? Say a player creates 100 runs by the linear weights formula. Is the "actual" number of runs he contributed 100 +/- 2 or 100 +/- 20? You could use your simulator to derive these error bars for various levels of RE.

I think that would be an important finding.


How are Runs Really Created

August 14, 2002 - Rob Wood

Here is what I was thinking. The standard linear weights (runs created, etc.) formulas use average run-values over all the disparate contexts a batter faces during a season. Let's just talk about the 24 base-out situations, though conceivably other information could also be taken into account. Tango has shown that there is quite a bit of variability in the run-values of events across the different contexts, as we all intuitively understand.

Thus, when it is reported that a player had Y linear weights runs (or runs created, etc.), we know that this is an estimate of his actual runs contributions which could be more accurately measured by looking at the player's performance in each of the 24 base-out contexts. This is what Tango did for some of the 2001 MVP candidates in the linked article.

I am hoping that you can use your simulator to estimate the inherent variability (due to context) in the linear weights runs estimates. I would suggest fixing a player's seasonal batting line (so many singles, doubles, triples, home runs, etc.). Then put this player into zillions of different team contexts, different places in the lineup, etc., and for each plate appearance randomly select one of his fixed outcomes.

[As a technical aside, I would suggest doing the random selection without replacement so as to have the exact same number of events in his seasonal batting line in each simulation trial.]

Then use your RE machinery to calculate the player's actual runs contribution for each trial. Do zillions of trials. Derive the sampling distribution, if you want to call it that, of the player's runs contribution. Of course the mean of this distribution should be the original linear weights runs we started with. However, there will be a fair amount of variability around this value. How large is that variability is what I think would be extremely interesting to investigate.

Maybe get separate estimates of the variability due to the base-out context (holding the team context fixed) and the variability due to different team contexts (holding the base-out situation fixed).


How are Runs Really Created - Third Installment

September 16, 2002 - Rob Wood

Does the superiority of the new methods over runs created have anything to do with the argument that runs created estimates runs scored whereas the linear weights methods estimate runs scored above average (and therefore uses more information)? I am not "taking sides" here, I'm just curious if they use the same information.


How are Runs Really Created - Third Installment

September 17, 2002 - Rob Wood

Tango, can you post the complete formula you are using? Or maybe it's a series of formulas? Thanks much.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 2:17 p.m., August 22, 2003 (#36) - Rob Wood
  Yes, this issue is confusing due to the different approaches associated with classical statistics versus bayesian statistics. Ed's post #23 does a good job describing the different approaches.

I have always taken "regression to the mean" to be related to the bayesian updating approach. That is, updating our best guess as to the player's true ability level, taking into account league average, the player's observed average, and his number of plate appearances.

Lastly, I think that 10,000 plate appearances has got to be enough sample for us to be pretty darned sure of the player's true ability. I cannot believe that even with 10,000 PAs, we would need to regress 5 or 10% to the league average. I'll try to dig up the standard updating formulas in this case to see what the regression pcts are at different sample sizes.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 11:52 p.m., August 25, 2003 (#46) - Rob Wood
  I programmed the bayesian updating formula and here are the results.

At bats Pct to regress toward league average (prior mean)
500 38.9%
1000 24.1%
2000 13.7%
3000 9.6%
4000 7.4%
5000 6.0%
6000 5.0%
7000 4.3%
8000 3.8%
9000 3.4%
10000 3.1%

Hope this helps.


Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 11:52 p.m., August 25, 2003 (#47) - Rob Wood
  I programmed the bayesian updating formula and here are the results.

At bats Pct to regress toward league average (prior mean)
500 38.9%
1000 24.1%
2000 13.7%
3000 9.6%
4000 7.4%
5000 6.0%
6000 5.0%
7000 4.3%
8000 3.8%
9000 3.4%
10000 3.1%

Hope this helps.


Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 6:17 p.m., August 28, 2003 (#2) - Rob Wood
  Do the +/- 6 run leads in the table include leads of greater than 6 runs as well? I am guessing that they do. Plus, I have fiddled around with some of the numbers and the counts do not seem to "reconcile". I will refrain from posting any specific (alleged) anomalies until I play around some more.

I'd also like to extend thanks to Phil B. for gathering this data.


Copyright notice

Comments on this page were made by person(s) with the same handle, in various comments areas, following Tangotiger © material, on Baseball Primer. All content on this page remain the sole copyright of the author of those comments.

If you are the author, and you wish to have these comments removed from this site, please send me an email (tangotiger@yahoo.com), along with (1) the URL of this page, and (2) a statement that you are in fact the author of all comments on this page, and I will promptly remove them.